30 research outputs found

    Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection

    Get PDF
    Feature selection is used to select a subset of relevant features in machine learning, and is vital for simplification, improving efficiency and reducing overfitting. In filter-based feature selection, a statistic such as correlation or entropy is computed between each feature and the target variable to evaluate feature relevance. A relevance threshold is typically used to limit the set of selected features, and features can also be removed based on redundancy (similarity to other features). Some methods are designed for use with a specific statistic or certain types of data. We present a new filter-based method called Relevance-Redundancy Dominance that applies to mixed data types, can use a wide variety of statistics, and does not require a threshold. Finally, we provide preliminary results, through extensive numerical experiments on public credit datasets

    Constraint acquisition and the data collection bottleneck

    Get PDF
    The field of constraint acquisition (CA) aims to remove the “modelling bottleneck” by learning constraints from examples. However, it gives rise to a “data collection bottleneck” as humans must prepare a suitable (labelled) dataset. A recently published paper described an unsupervised CA method called MineAcq that can learn standard CA benchmarks. In this paper we summarise the results, and apply MineAcq to a new, noisy, unlabelled dataset that was not designed for CA

    Robust constraint acquisition by sequential analysis

    Get PDF
    Modeling a combinatorial problem is a hard and error-prone task requiring expertise. Constraint acquisition methods can automate this process by learning constraints from examples of solutions and (usually) non-solutions. We describe a new statistical approach based on sequential analysis that is orders of magnitude faster than existing methods, and gives accurate results on popular benchmarks. It is also robust in the sense that it can learn constraints correctly even when the data contain many errors

    An analytics-based heuristic decomposition of a bilevel multiple-follower cutting stock problem

    Get PDF
    This paper presents a new class of multiple-follower bilevel problems and a heuristic approach to solving them. In this new class of problems, the followers may be nonlinear, do not share constraints or variables, and are at most weakly constrained. This allows the leader variables to be partitioned among the followers. We show that current approaches for solving multiple-follower problems are unsuitable for our new class of problems and instead we propose a novel analytics-based heuristic decomposition approach. This approach uses Monte Carlo simulation and k-medoids clustering to reduce the bilevel problem to a single level, which can then be solved using integer programming techniques. The examples presented show that our approach produces better solutions and scales up better than the other approaches in the literature. Furthermore, for large problems, we combine our approach with the use of self-organising maps in place of k-medoids clustering, which significantly reduces the clustering times. Finally, we apply our approach to a real-life cutting stock problem. Here a forest harvesting problem is reformulated as a multiple-follower bilevel problem and solved using our approachThis publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/228

    A Partial Taxonomy of Substitutability and Interchangeability

    Get PDF
    Substitutability, interchangeability and related concepts in Constraint Programming were introduced approximately twenty years ago and have given rise to considerable subsequent research. We survey this work, classify, and relate the different concepts, and indicate directions for future work, in particular with respect to making connections with research into symmetry breaking. This paper is a condensed version of a larger work in progress.Comment: 18 pages, The 10th International Workshop on Symmetry in Constraint Satisfaction Problems (SymCon'10

    Bounding the search space of the Population Harvest Cutting Problem with Multiple Size Stock Selection

    Get PDF
    In this paper we deal with a variant of the Multiple Stock Size Cutting Stock Problem (MSSCSP) arising from population harvesting, in which some sets of large pieces of raw material (of different shapes) must be cut following certain patterns to meet customer demands of certain product types. The main extra difficulty of this variant of the MSSCSP lies in the fact that the available patterns are not known a priori. Instead, a given complex algorithm maps a vector of continuous variables called a values vector into a vector of total amounts of products, which we call a global products pattern. Modeling and solving this MSSCSP is not straightforward since the number of value vectors is infinite and the mapping algorithm consumes a significant amount of time, which precludes complete pattern enumeration. For this reason a representative sample of global products patterns must be selected. We propose an approach to bounding the search space of the values vector and an algorithm for performing an exhaustive sampling using such bounds. Our approach has been evaluated with real data provided by an industry partne

    Generating difficult CNF instances in unexplored constrainedness regions

    Get PDF
    When creating benchmarks for satisfiability (SAT) solvers, we need Conjunctive Normal Form (CNF) instances that are easy to build but hard to solve. A recent development in the search for such methods has led to the Balanced SAT algorithm, which can create k-CNF instances with m clauses of high difficulty, for arbitrary k and m. In this article, we introduce the No-Triangle CNF algorithm, a CNF instance generator based on the cluster coefficient graph statistic. We empirically compare the two algorithms by fixing the arity and the number of variables, but varying the number of clauses. We find that the hardest instances produced by each method belong to different constrainedness regions. In the 3-CNF case, for example, hard No-Triangle CNF instances reside in the highly-constrained region (many clauses), while Balanced SAT instances obtained from the same parameters are easy to solve. This allows us to generate difficult instances where existing algorithms fail to do so

    A grouping genetic algorithm for joint stratification and sample allocation designs

    Get PDF
    Finding the optimal stratification and sample size in univariate and multivariate sample design is hard when the population frame is large. There are alternative ways of modelling and solving this problem, and one of the most natural uses genetic algorithms (GA) combined with the Bethel-Chromy evaluation algorithm. The GA iteratively searches for the minimum sample size necessary to meet precision constraints in partitionings of atomic strata created by the Cartesian product of auxiliary variables. We point out a drawback with classical GAs when applied to the grouping problem, and propose a new GA approach using “grouping” genetic operators instead of traditional operators. Experiments show a significant improvement in solution quality for similar computational effort

    Classifier-based constraint acquisition

    Get PDF
    Modeling a combinatorial problem is a hard and error-prone task requiring significant expertise. Constraint acquisition methods attempt to automate this process by learning constraints from examples of solutions and (usually) non-solutions. Active methods query an oracle while passive methods do not. We propose a known but not widely-used application of machine learning to constraint acquisition: training a classifier to discriminate between solutions and non-solutions, then deriving a constraint model from the trained classifier. We discuss a wide range of possible new acquisition methods with useful properties inherited from classifiers. We also show the potential of this approach using a Naive Bayes classifier, obtaining a new passive acquisition algorithm that is considerably faster than existing methods, scalable to large constraint sets, and robust under errors

    Solving a hard Cutting Stock Problem by machine learning and optimisation

    Get PDF
    We are working with a company on a hard industrial optimisation problem: a version of the well-known Cutting Stock Problem in which a paper mill must cut rolls of paper following certain cutting patterns to meet customer demands. In our problem each roll to be cut may have a different size, the cutting patterns are semi-automated so that we have only indirect control over them via a list of continuous parameters called a request, and there are multiple mills each able to use only one request. We solve the problem using a combination of machine learning and optimisation techniques. First we approximate the distribution of cutting patterns via Monte Carlo simulation. Secondly we cover the distribution by applying a k-medoids algorithm. Thirdly we use the results to build an ILP model which is then solved
    corecore